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Abstract 

Background: The benzoxazinoids 2,4-dihydroxy-1,4-benzoxazin-3-one (DIBOA) and 2,4-dihydroxy-7- methoxy-1, 
4-benzoxazin-3-one (DIMBOA), are key defense compounds present in major agricultural crops such as maize and 
wheat. Their biosynthesis involves nine enzymes thought to form a linear pathway leading to the storage of DI(M) 
BOA as glucoside conjugates. Seven of the genes (Bxl-Bx6 and Bx8) form a cluster at the tip of the short arm of 
maize chromosome 4 that includes four P450 genes (Bx2-5) belonging to the same CYP71C subfamily. The origin of 
this cluster is unknown. 

Results: We show that the pathway appeared following several duplications of the TSA gene {a-subunit of 
tryptophan synthase) and of a Bx2-like ancestral CYP71C gene and the recruitment of Bx8 before the radiation of 
Poaceae. The origins of Bx6 and Bx7 remain unclear. We demonstrate that the Bx2-like CYP71C ancestor was not 
committed to the benzoxazinoid pathway and that after duplications the Bx2-Bx5 genes were under positive 
selection on a few sites and underwent functional divergence, leading to the current specific biochemical 
properties of the enzymes. The absence of synteny between available Poaceae genomes involving the Bx gene 
regions is in contrast with the conserved synteny in the TSA gene region. 

Conclusions: These results demonstrate that rearrangements following duplications of an IGL/TSA gene and of a 
CYP71C gene probably resulted in the clustering of the new copies (Bxl and Bx2) at the tip of a chromosome in an 
ancestor of grasses. Clustering favored cosegregation and tip chromosomal location favored gene rearrangements 
that allowed the further recruitment of genes to the pathway. These events, a founding event and elongation 
events, may have been the key to the subsequent evolution of the benzoxazinoid biosynthetic cluster. 
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Background 

Plants are sessile organisms which have evolved chem- 
ical and mechanical ways to defend against pathogens, 
herbivores and competitors. The synthesis of toxic com- 
pounds, generally arising from the so-called secondary 
metabolism is a hallmark of plant defense. Among the 
enzymes often involved in secondary metabolism and in 
particular in the synthesis of defense compounds and 
toxins in plants are the P450 enzymes [1]. These are 
heme-dependent oxidase enzymes that generally catalyze 
the insertion of one oxygen atom in a substrate after 

* Correspondence: rfeyer@sophia.inra.fr 

Institut National de la Recherche Agronomique, UMR 1355 Institut Sophia 
Agrobiotech, Centre National de la Recherche Scientifique, UMR 7254, 
Universite de Nice Sophia Antipolis, Sophia-Antipolis, France 

Bio Med Central 



activation of molecular oxygen. The most common reac- 
tion catalyzed by this protein family is hydroxylation, 
but P450s are involved in a wide variety of catalyses such 
as dimerizations, isomerizations, dehydratations or 
reductions [2,3]. P450 proteins represent a large protein 
family very well represented in plants. For example 272 
cytochrome P450 genes (CYP genes) are present in the 
Arabidopsis genome, including 26 pseudogenes [3]. This 
superfamily groups together proteins with as low as 20% 
sequence identity. Nevertheless secondary and tertiary 
structures are well conserved throughout the family. For 
instance P450 proteins share some conserved structures 
and sequences linked to properties such as oxygen or 
heme binding. 
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In grasses, P450s of the CYP71C subfamily are 
involved in the biosynthesis of the cyclic hydroxamic 
acids 2,4-dihydroxy-l,4-benzoxazin-3-one (DIBOA) and 
2,4-dihydroxy-7-methoxy-l,4-benzoxazin-3-one (DIM- 
BOA). These natural compounds are well described as 
natural pesticides and allelochemicals, as chemical 
defense against microbial diseases and herbivory [4,5]. 
Their occurrence depends on the plant species [6]. For 
example, the predominant cyclic hydroxamic acid in rye 
Secale cereale is DIBOA whereas the main cyclic hydro- 
xamic acid in maize Zea mays is DIMBOA, that differs 
from DIBOA by an additional methoxy group [5]. The 
presence of cyclic hydroxamic acids in grain crops has 
been known for over 50 years [7]. The four P450 genes 
involved in DIBOA biosynthesis, Bx 2 to Bx5 (known as 
CYP71C1 to CYP71C4 in the P450 nomenclature), were 
first described in maize by Frey et al. [8,9]. The final re- 
action steps leading to DIMBOA-glucoside were also 
elucidated in maize [10-12]. The full pathway involves 9 
enzymes (BX1 to BX9) thought to act sequentially in the 
synthesis of DIMBOA-glucoside from indole-3-glycerol 
phosphate (Figure 1). The Bx2 to BxS genes are clus- 
tered on the short arm of maize chromosome 4 [8,9]. 
Genetic analysis indicated that Bxl, Bx6, Bx7 and Bx8 
are close to, or within this cluster, thus grouping genes 
of different families within a short chromosomal region 
[8-10,12]. Upon wounding, an additional O-methylation 
is activated, leading to HDMBOA [13,14] but the gene 
responsible for this reaction is still unknown. The same 
DI(M)BOA biosynthetic pathway has also been 
described in wheat Triticum aestivum, in rye S. cereale 
and in the wild barley Hordeum lechleri, the cultivated 
barley probably having lost the gene cluster [15-17]. The 
four CYP71C genes were cloned and characterized in 
diploid and hexaploid wheat and in wild barley [17-19]. 

A common evolutionary origin of this cluster of maize 
P450 genes by successive gene duplications has been 
proposed [8,16,20]. While it is no longer surprising to 
find biosynthetic gene clusters in bacteria and fungi [21], 
the nature and origin of such clusters in plants and ani- 
mals is less studied. Large gene families such as the CYP 
family are often characterized by multiple gene duplica- 
tions that leave a genomic trace as clusters of related 
genes [22,23]. However, these structural clusters such as 
the 13 CYP71B genes clustered on chromosome 3 of 
Arabidopsis thaliana, the 16 CYP6 genes clustered in 
the mosquito, or the 22 CYP2 loci clustered on mouse 
chromosome 7 [22,24,25] are not known to be co- 
regulated or to participate in a common pathway. Bio- 
synthetic gene clusters are therefore different because 
they comprise non-homologous genes that function col- 
lectively. They have received increasing attention in 
plants [26], where known biosynthetic gene clusters 
serve in elaborating defense compounds such as 



phytoalexins from common precursors. They include the 
clusters for the biosynthesis of thalianol in A. thaliana, 
avenacin in Avena strigosa, momilactone and phytocas- 
sane in Oryza sativa [27] as well as the cyanogenic glu- 
coside biosynthetic clusters in Lotus japonicus, Manihot 
esculenta and Sorghum bicolor [28]. 

The Bx gene cluster of maize is therefore of great 
interest, because it consists of an apparent structural 
cluster of four CYP71C genes in close genomic proxim- 
ity with members of other gene families that, together, 
are known to direct the synthesis of DIMBOA glucoside 
from a common metabolic intermediate, indole 3- 
glycerol phosphate [20]. The sequence of the gene dupli- 
cations, the nature of the ancestral genes and the 
mechanisms leading to the recruitment of several genes 
from different families into this biosynthetic cluster have 
not been determined in detail, yet these are the key 
questions in understanding the evolution of secondary 
metabolic gene clusters in plants [26,27]. 

Here, we take advantage of the information from 
newly sequenced genomes of Poaceae and of the known 
biochemical properties of the enzymes to describe the 
evolutionary origin of the DIMBOA biosynthetic path- 
way. We used a phylogenetic approach to establish the 
sequence of duplications of an ancestral CYP71C gene 
leading to the Bx2-Bx5 cluster in maize. We delineated 
the involvement of critical amino acids in achieving the 
current biochemical specificities of the P450 enzymes. 
We studied the syntenic relationships of genes at the 
interface between primary metabolism and the DIMBOA 
pathway: ZmBxl, ZmTSA and TSA- like, Zmlgl and Igl- 
like. We also searched for synteny around the other 
genes of the Bx cluster to gain insights into the origin of 
the entire DIMBOA pathway. 

Methods 

Sequence data 

The BX2-5 protein and transcript sequences from maize 
[8] were used for BLAST approaches (blastn, blastp and 
tblastn). Sequences with more than 50% amino acid se- 
quence identity with one maize CYP71C were used to 
identify equivalent CYP71C sequences in other Poaceae. 
Searches were made on the BLAST server of NCBI [29] 
for all Poaceae, on the maize genome website [30], on 
the Brachypodium distachyon website [31] and on the 
Phytozome database [32]. P450 sequences were manually 
checked and their annotation corrected when necessary 
based on known P450 motifs [3]. Incomplete sequences, 
and pseudogenes were removed from further analysis, 
resulting in 75 sequences representing the CYP71C sub- 
family in Poaceae. Five A. thaliana sequences were 
chosen to root the tree (CYP76C3, CYP76C7, CYP76C4, 
CYP76C1 and CYP76G1). For the studies on Bxl (and 
Igl, TSA, TSA_like), Bx6, Bx7, Bx8 and Bx9, sequences 
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Figure 1 The DIMBOA biosynthetic pathway and cellular compartment localizations. Adapted from [8,12,20]. 



were searched by BLASTP based on maize sequences 
with the same criteria (id% > 50) and on the same 
databases as for the P450 study. Intron/exon structures 
were determined whenever the genomic sequence 
was available. 

Phylogenetic tree reconstruction 

The selection of the best-fit model of protein evolution 
for the CYP71C subfamily was done with Pro Test soft- 
ware version 2.4 [33] based on protein alignment 
obtained with MUSCLE software (available on the web- 
site phylogeny.fr [34]) using the default parameters 
[35-37]. No curation was needed because of the high 
identity between sequences and the correct alignment of 
P450 motifs. The JTT model [38] was then chosen as 
the best fit model of protein evolution for our sequences. 
The tree of the CYP71C subfamily in Poaceae was gen- 
erated on the phylogeny.fr website [34] using maximum 
likelihood as reconstruction method (PhyML program). 



Bootstrapping (100 iterations) was done to document 
branch support. These criteria were also used for the re- 
construction of the other BX phylogenetic trees. 

Positive selection in coding sequences 

The codeml program of the PAML software package 
was used to detect positive selection in the CYP71C sub- 
family [39]. cDNA sequences were aligned based on pro- 
tein alignment by using the RevTrans 1.4 server [40]. 
We used branch and site models which allow co to vary 
among branches in the phylogeny or among sites on the 
alignment [41,42]. First, we compared the one-ratio 
model (MO, one co estimated for all sites) to the free- 
ratio model (independent co ratios for each branch) to 
test for the hypothesis of variable co among branches. 
Then, the M7 (beta distribution of co ratios, with 0 < co < 
1) versus M8 (extension of M7 with a supplementary site 
class with co > 1 estimated from the dataset) compari- 
sons were done to test for positive selection among sites. 
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Finally, the models M8 and M8a (co = 1) were compared 
to determine if for a small fraction of sites the co esti- 
mated under M8 was significantly higher than 1 [43,44]. 
Likelihood-ratio tests (LRT) were used to compare mod- 
els. Twice the log-likelihood difference 2AlnL was com- 
pared with a x 2 distribution with degrees of freedom 
corresponding to the difference of free parameters num- 
bers between the two models compared. The branch-site 
model MA was further applied to our data to detect 
positive selection affecting only a few sites on pre- 
specified lineages [45]. Four independent branches were 
studied with this model and Likelihood-ratio tests (LRT) 
were used to compare models with Bonferroni's multiple 
testing corrections. 

P450 secondary structure 

P450 proteins generally present 13 conserved a-helices 
named from A to L [46]. The maize Bx2-5 proteins were 
analyzed using tools available on the web (Jpred 3 [47], 
Porter [48]) to define the consensus zones correspond- 
ing to these putative helices. Substrate recognition sites 
(SRSs) were localized by homology to the SRSs described 
by Gotoh [49]. 

Functional divergence analysis of BX2 to BX5 
protein clades 

The software DIVERGE2 (Detecting Variability in Evolu- 
tionary Rates among Genes) was used to identify critical 
amino acid residues involved in functional innovations 
after gene duplications [50]. The coefficients of type I 
and type II functional divergence (GI and 611, respect- 
ively) between two chosen clades were calculated. If 
these coefficients are superior to 1, it means that some 
amino acids were subject to altered selective constraints 
(type I functional divergence) or that a radical shift of 
amino acids physicochemical properties occurred after 
gene duplication and/or speciation (type II functional di- 
vergence) [50-53] . Thirty nine sequences comprising 
each BX clade and the sequences associated to the BX2 
clade were aligned using MUSCLE (default parameters). 
This alignment and the equivalent portion of the tree 
were used as input parameters for analyses with DI- 
VERGE2 software. We compared BX clades resulting 
from duplication events to each other. For each test, the 
posterior probability of each site to be under functional 
divergence was calculated. Sites with a posterior prob- 
ability Qk > 0.67 (to select only radical cluster-specific 
sites [50-53]) were localized on the maize BX2-5 
alignment. 

BX2 molecular modeling and dockings 

A model for BX2 was generated with the iterative 
threading assembly refinement server (I- TASSER [54]) 
[55,56] using the following templates: (pdb numbers) 



31<9v, 3e6i, 3na0, 2hi4, 3czh and lizo. We obtained a 
model with a C-score = -1.46, indicating a correct pre- 
diction. The top I-TASSER templates all presented a 
normalized Z-score > 1 with a large protein coverage (al- 
ways superior to 76%), reflecting the high accuracy of 
the alignment. Predictions of binding sites were 
confident with a BS -score > 0.5 for all predictions. Dock- 
ings were assessed using the AutoDock4 software pack- 
age [57,58]. Proteins were first prepared by removing 
water molecules, checking for missing atoms, adding of 
non-bonded hydrogens and computed Gasteiger charges 
under the AutoDockTool ADT 1.5.4 [59]. The protein 
model was then used to construct a grid box with a 
grid-point spacing of 0.375 A. To dock the indole sub- 
strate, the input protein was the heme-containing Bx2. 
The grid centre position was positioned on top of the 
heme and included 40x40x40 = 64,000 points. Four in- 
dependent Lamarckian genetic algorithm searches 
(250,000 and 2,500,000 evaluations) were run. For each 
analysis, the solutions were compared to determine if 
the results were reliable and reproducible. Interacting 
residues were also compared to determine the conserved 
residues in contact with indole. 

Synteny of the Bx genes 

Three approaches were used to analyze the synteny of 
maize Bx genes within available Poaceae genomes. We 
used the SyMAP v3.4 (Synteny Mapping and Analysis 
Program) to identify and display genome synteny align- 
ments for Z. mays, O. sativa, B. distachyon, S. bicolor 
and S. italica genomes [60]. The Plant-Synteny site was 
also used to study synteny between Z. mays, Triticum, 
O. sativa, S. bicolor and B. distachyon [61]. Maizese- 
quence.org was used to identify putative syntenies be- 
tween maize and O. sativa and S. bicolor. 

Results 

Phylogenetic analysis of the genes related to Bx2-5 in the 
CYP71C subfamily 

We aligned all available CYP71C sequences and recon- 
structed a phylogenetic tree to determine the evolution- 
ary origin of the P450 genes of the Bx cluster in Poaceae 
(Figure 2 and Additional file 1). The tree we obtained 
showed that the maize BX2-5 sequences had orthologs 
distributed in four branches, boxed in Figure 2 as BX2 
-Bx5 clades. Each of these four branches was strongly 
supported (bootstrap values 91-100%). The four clades 
contained sequences of biochemically characterized 
CYP71C enzymes such as those of maize [8,16] and 
wheat [19] as well as two sequences encoding P450s 
from rice that have not yet been biochemically charac- 
terized. Moreover, while the BX3, 4 and 5 clades were 
monophyletic, the BX2 clade was included in a parent 
clade that contained fourteen "BX2-like" sequences from 
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Figure 2 Phylogenetic tree of CYP71C sequences rooted with five CYP76 sequences of Arabidopsis thaliana. Maximum likelihood tree 
with branch lengths drawn to scale in terms of the number of substitutions per site. Bold branches indicate branches with co> 1 according to the 
free ratio model of codeml. BX clades are boxed. Dots represent gene duplications. Only nodes with bootstrap values superior to 75% are shown. 
The full tree is shown in Additional file 1. 



non-benzoxazinoid producers: S. bicolor, S. italica and 
O. sativa. The rest of the tree was composed of at least 
four branches with more distant sequences of the 
CYP71C subfamily from maize, B. distachyon, S. italica, 
S. bicolor and O. sativa. 

Multiple intron losses in the CYP71C subfamily 

Frey et al. in their landmark paper on the Bx cluster [8] 
suggested that the position of introns in the CYP71C 
genes was indicative of a common evolutionary origin. 
We therefore located the position of introns in all the 
available genomic CYP71C sequences (Additional file 1). 
Only two intron positions were found, both in phase 
zero, and located at conserved positions, 192 (intron 1) 
and 336 (intron 2) of BX2 (hereafter all amino acid posi- 
tions are given in terms of equivalent positions of maize 
BX2). Thirty four genes had both common introns while 



twenty five sequences had only intron 2. All genes in the 
BX3, BX4 and BX5 clades had both introns, except for 
maize BX4 that had only intron 2. The three genomic 
sequences of the BX2 clade had only intron 2. Intron 2 
is an ancestral intron found in most plant P450s [22], 
while the conserved position of intron 1 suggested that 
it had the same origin among all the sequences. The 
presence or absence of the introns did not follow a sim- 
ple phylogeny. This indicated independent intron losses 
or gains in the different branches. Identification of para- 
log/ortholog relationships in the phylogeny allowed us 
to place duplication and speciation events on the tree. 
Accordingly, the most likely hypothesis is that intron 1 
was introduced after the first duplication in the CYP71C 
subfamily and that more than a dozen independent in- 
tron losses have occurred. The alternative hypothesis of 
an equally high number of independent gains of intron 1 
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occurring at the same position and in the same phase is 
unlikely. 

Positive selection in the CYP71C coding sequences 

We analyzed the evolution of the CYP71C subfamily and 
in particular the evolution of the maize 5x2-5 genes. We 
hypothesized that, after duplication and/or speciation 
events, the four CYP71C genes were subjected to selec- 
tion. On one hand, the proteins maintained their P450 
structure and overall catalytic activity (redox partner 
binding, dioxygen binding and activation), and on the 
other hand, they each acquired a distinct and high sub- 
strate specificity [8,16]. The conserved features of the 
four proteins i.e. helical structures, SRS regions, and 
conserved P450 motifs are illustrated in Figure 3. This 
balance of P450 structure conservation and substrate 
specificity would imply that genes are globally under 
purifying or neutral selection (co < 1 or co = 1 respect- 
ively), explaining the high sequence identity between 
proteins, and that some particular sites are under strong 
positive selection (co > 1), leading to the substrate speci- 
ficity. This hypothesis was evaluated by testing for posi- 
tive selection among specific lineages, among sites and 
among specific sites in specific lineages on our phyl- 
ogeny. Branch and site models were first used to test the 
hypothesis of heterogeneous levels of selection among 
lineages and to test for positive selection among sites. 
The branch model would allow us to test if positive se- 
lection existed among the various branches. The site 
models permit co to vary among sites. As the positively 
selected sites in proteins are generally very few, focusing 
on the overall sequence would fail to detect these 
selected sites as the co value on the overall alignment 
would be inferior to 1. These tests would thus allow us 
to detect sites under positive selection in the CYP71C 
subfamily. The changes at these sites being favored dur- 
ing evolution would be potentially important for the spe- 
cificity of the proteins. The comparison between the 
one-ratio model MO (one co ratio for all sites calculated 
from the data) and the free-ratio model (independent co 
ratios for each branch) revealed a heterogeneous selec- 
tion level among lineages (2AlnL = 725.58, p < 0.01). 
The three branches that define the BX3, 4 and 5 clades 
(Figure 2) showed co values superior to 1 supporting the 
hypothesis of adaptive evolution while the branch defin- 
ing the BX2 clade did not. The test of positive selec- 
tion among sites, M7/M8 was statistically significant 
(2AlnL = 680.30, p < 0.01). This suggested that the co 
ratio was variable among sites and that about 17% of the 
sites were under positive selection with co = 1.69. The 
M8/M8a test was also significant, meaning that the esti- 
mated co in M8 was statistically different from 1. The 
M8 model identified 52 sites under positive selection 
with a posterior probability superior to 0.95 (Additional 



file 2). We also used the branch-site model MA to go 
further into the specific evolution of each group of the 4 
P450s of the DIMBOA-biosynthetic pathway. The use of 
this model allows the detection of positively selected 
sites on prespecified lineages. The phylogenetic tree 
shows the duplicated origin of the four genes, repre- 
sented by the four Bx clades on Figure 2. Our aim was 
to detect positive selection that affected sites in each of 
the four clades by specifying the foreground branches 
during the tests. Bonferroni's corrections for multiple 
testing were done. The test returned sites under positive 
selection (posterior probablility >0.5) in the BX2 and 
BX4 clades but the associated co2 was only statistically 
superior to 1 (a = 0.10) for BX2 sequences (co2 = 6.44). 
Omitting sites in the membrane anchor not involved in 
substrate specificity, three sites were identified for BX2, 
including one site in the E/F loop and another one in 
SRS5 (Figure 3). 

Functional divergence analyses of the BX P450 enzymes 

Tests of functional divergence with the DIVERGE2 soft- 
ware were realized to identify critical amino acid resi- 
dues in each of the four BX clades. Functionally 
divergent sites may explain the substrate specificity 
of each protein. Thirty nine protein sequences, in- 
cluding the sequences of the four BX clades and 
the BX2-like sequences were aligned and the equiva- 
lent portion of the tree were used for analyses of 
type I and type II functional divergences among BX 
clades. Clades of BX proteins originating from gene 
duplications were compared to each other. We also 
compared the BX2 clade versus the larger (BX4 
(BX3/BX5) clade and the BX4 clade versus the 
(BX3/BX5) clade. Sites with significant posterior 
probability Qk > 0.67 were localized on the maize 
BX2-5 protein alignment (Figure 3). 

Comparisons with proteins from the BX4 clade 
showed no type I or II functional divergence. The com- 
parison of the BX2 clade with the (BX4(BX3/BX5)) clade 
returned a statistically significant type I functional diver- 
gence (61 = 0.151 ± 0.067; LRT = 5.10; p > 0.05). Met 77 
of maize BX2, localized within the first P450 motif, was 
returned with Qk > 0.67. The type I functional diver- 
gence test between the BX2 and BX3 clades was statisti- 
cally significant (91 = 0.353 ± 0.084; LRT = 17.610; p > 

0. 05) and identified 5 sites. One site was on P450 motif 

1, one in SRS1, one in loop E/F, another one in SRS2, 
and the last one was after SRS5. 

Functional divergence of type II was also found be- 
tween these two clades (911 = 0.140 ± 0.055). Twenty 
sites were identified, among which 14 amino acid resi- 
dues differed between maize BX3 and BX2. 

The type I functional divergence test between the BX2 
and BX5 clades was statistically significant (91 = 0.294 ± 
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Figure 3 Maize BX2-5 protein alignment. P450 motifs are indicated in purple and the 6 SRSs localizations are indicated in orange. Sites 
potentially under positive selection according to the MA model from codeml analyses are in blue. Sites under type I functional divergence are in 
grey and sites under type II functional divergence are in pink. The 13 putative helices are marked in dotted boxes. Arrows indicate ten potentially 
important sites for the substrate specificity of each protein. Red highlighted sites represent the four residues (Leu253, Ala341, Thr345 and Ile527) 
in contact with indole according to docking results (AutoDock4). 



0.097; LRT = 9.216; p > 0.05) with one site (Glu 451) at 
Qk > 0.67. The type II functional divergence test (011 = 
0.214 ± 0.054) returned twenty five sites. In the compari- 
son of the BX3/BX5 clades, functional divergence of type 
I (61 = 0.417 ± 0.066; LRT = 39.688; p > 0.05) was found 
on 7 sites. All these sites differed between maize BX3 
and BX5, suggesting a radical shift in the rate of 



evolution for these amino acids. The type II functional 
divergence test returned a 611 equal to 0.222 ± 0.0356 
and identified 25 sites which all differed between maize 
BX3 and BX5. They were placed throughout the protein 
alignment with two sites in SRS1, three sites in SRS2, 
two sites in SRS3, three sites in SRS4, one in SRS5, two 
in the E/F loop and two in the F/G loop. 
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Comparisons of maize BX2-BX5 sites subjected to positive 
selection and to functional divergence 

Positions on the maize BX2-5 alignment with at least 
one site under positive selection and/or functional diver- 
gence were checked. Among them, we focused on sites 
showing either amino acid conservation or four different 
amino acids. These sites could be involved in the specific 
evolution of the maize BX2-BX5 functions. Thirty two 
positions were extracted. Among them, 23 involved one 
or more radical biochemical changes, potentially leading 
to substrate specificity. Nine sites were identified when 
focusing on sites within SRSs or in the F and G helices 
or in the F/G loop (Figure 3). One site was added to 
these critical amino acids: the positively selected and 
under type II functional divergence Met 234 of BX2 just 
before the F- helix. This site seems to be very important 
from a selection point of view as it implies a radical bio- 
chemical shift from positively charged to non-polar. 

Maize BX2 modeling and indole docking 

Homology modeling was used to understand the relation- 
ship between sites under positive selection and/or func- 
tional divergence and protein substrate specificity. The 
model was also used in docking approaches to gain insight 
into the specific interactions of the protein with its sub- 
strate indole. Indole was docked into the heme-containing 
BX2 protein model and the residues in contact with indole 
in the 40 computed docked conformations were com- 
pared. Four protein residues were repeatedly found, 
namely Leu 253 on 35/40 conformations, He 527 on 36/40 
conformations and Ala 341, Thr 345 and the heme 
molecule on all docked conformations. The relative pos- 
ition of the indole secondary amine was oriented to- 
wards Ala 341 and He 527 on 31/40 conformations. 
The four protein residues (Figures 3 and 4A) are all in 
SRS regions, and three were identical in the other BX 
P450s. The fourth residue, the apolar He 527 was 
replaced by a polar Thr in BX4 and BX3 and by an 
apolar Met in BX5. When looking at the spatial 
localization of all sites previously identified as under 
positive selection and/or under functional divergence, 
two sites, Ser 156 and Cys 157, were oriented toward 
the active site. Both sites were identified above among 
the ten sites potentially important for the substrate spe- 
cificities (Figures 3 and 4B). 

Synteny of the Bx2-5 genes and their homologs 
among Poaceae 

We analyzed the synteny of the maize 5x2-5 genes to 
determine whether it was conserved in the genomes of 
other Poaceae (Figure 5). Although synteny blocks were 
detected between maize chromosome 4 and O. sativa, S. 
bicolor, B. distachyon and S. italica genomes, none of 
them contained genes homologous to the 5x2-5 genes. 



All the CYP71C homologs included in our phylogenetic 
analysis from those species were found in other genome 
locations. The cluster formed by S. italica CYP71C88, 
C89 and C92 for instance, was syntenic to the cluster of 
maize CYP71C36, CS6 and C57 on chromosome 2 
(Figure 5A) and phylogenetically distant from the Bx 
cluster. Moreover, non-Bx genes in the vicinity of the 
Bx2-5 cluster did not show conservation of synteny. 
Therefore, in all currently available genomes, the pos- 
ition of the 5x2-5 genes is unique to maize. 

We also studied the phylogeny and syntenic relation- 
ships of the other maize Bx genes (ZmBx) involved in 
the DIMBOA-biosynthetic pathway. Their possible Bx 
orthologs in other Poaceae genomes were located on 
chromosomes (or scaffolds for S. italica) and their in- 
tron positions were mapped. 

Phylogeny and synteny of the ZmBxl, ZmTSA, Zmlgl and 
Zmlgljike genes 

The Bxl and Igl genes are thought to result from dupli- 
cations of the TSA gene [62]. The twenty nine most 
closely related sequences from Z. mays, H. lechleri, T. 
aestivum, S. bicolor, S. italica, B. distachyon, O. sativa 
and Hordeum vulgare were analyzed. On the phylogen- 
etic tree, IGL/BX1 and TSA/TSA like sequences formed 
two distinct clades. In the IGL/BX1 clade, ZmBXl and 
T. aestivum BX1 were grouped together (Figure 6). The 
IGL sequences from Panicoideae were also closely 
related to BX1. A single branch included IGL from 
Pooideae and Ehrhartoideae. At first glance, these 
sequences followed the Poaceae phylogenetic history 
but this did not explain the origin of BX1 paralogs in 
T. aestivum, which belongs to the Pooideae (Figure 7). 
A likely explanation is that two duplication events of an 
Igl ancestor occurred before the separation of both 
Pooideae-Ehrhartoideae and Panicoideae (Figure 6), fol- 
lowed by reciprocal losses of paralogs in the Panicoideae 
and in the Pooideae- Ehrhartoideae lineages. The ancestral 
intron pattern of ZmBxl supports this view (Figure 6), but 
additional genome sequences from Pooideae are needed 
to validate it. TSA and TSA like proteins form the second 
clade on our tree. They probably originated from a more 
basal duplication of a "TSA ancestor", leading to the IGL/ 
BX1 and TSA/TSA_like lineages. 

We did not detect blocks of synteny in B. distachyon, S. 
italica, O. sativa and S. bicolor that contained any genes 
homologous to ZmBxl. However, the maize TSA_like/Igl 
and TSA regions on chromosomes 1 and 7, respectively, 
showed syntenic blocks in common within each of the 
four other Poaceae genomes. The results imply that 
these genes arose early during the evolution of Poaceae 
and maintained their syntenic relationships. Bxl thus 
appears to be the only paralog of the TSA/IGL genes 
that lacks conserved synteny, and its location following 
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Figure 4 Indole docking in the maize BX2 model. Indole is colored in green. Heme is represented in licorice. VMD was used to create the 
pictures [88]. A. Overall fold and secondary structure contents of maize BX2. The model is represented as NewCartoons and colored as follows: 
a-helices in purple, 31 0-helices in dark blue, turns in light blue, fi-strands in yellow and coils in white. The four red residues (Leu253, Ala341, 
Thr345 and Ile527) correspond to sites in contact with indole according to docking results (AutoDock4). B. Localization of the ten important sites 
(in dark blue) for maize CYP71C substrate specificity. The structures in red correspond to SRSs. Ser 156 and Cys 157 are inside SRSs and point 
towards the active site. 



rearrangement separated its evolutionary fate from that 
of the other TSA/IGL genes. 

Phylogeny and synteny of the ZmBx6, 7, 8 and 9 genes 

ZmBX6, the 2-oxoglutarate-dependent dioxygenase 
which catalyses the hydroxylation reaction at C-7 of 
DIBOA, was compared to the 15 closest sequences from 
Poaceae to reconstruct a phylogenetic tree (Figure 8). 
The ZmBX6 protein was most closely related to 
Sbl0g006910 of S. bicolor (59.9% identity) and to 
Si010340 of S. italica (65.7% identity). No sequences of 
Bx6 orthologs are currently available from wheat or wild 
barley. Although genetic mapping studies place the 
ZmBx6 gene in the ZmBx cluster on maize chromosome 



4 [11,12], we did not find ZmBx6 in the genomic se- 
quence but found a close paralog (76.5% identity at the 
protein level) isolated on the long arm of chromosome 
2. As the genomic sequence close to Bx4 contains gaps 
of undetermined length, it is likely that Bx6 is lacking in 
the current version of the maize genome. Therefore, 
synteny conservation around the ZmBx6 gene cannot be 
studied presently. 

ZmBX7 is a member of the large O-methyltransferase 
gene family, but paralogs were only found when lowering 
the BLASTP searches cutoff to 40% identity. The closest 
homologs were found in S. italica (Si022355, 62.9% iden- 
tity and Si010415, 46.2% identity) and B. distachyon 
(Bradilg47030, 49.9% identity). The corresponding genes 
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Figure 5 Chromosomal locations of maize Bx and related genes. A. Chromosomal locations of all maize genes used in this study. In brown 
are genes related to Bxl, TSA, Igl and Igljike. In red are genes included in the phylogenetic analysis of the CYP71C subfamily. 8x6 and Bx6 paralog 
are in blue, 8x7 in green, 8x8 and 8x9 in grey. Arrows indicate the gene orientation. The chromosomes are taken from the maizesequence.org 
website [30]. B. The Bx gene cluster (~ 264 kb) on maize chromosome 4. Gene length is not to scale. 085303 is the GRMZM2G085303 gene of 
unknown function 085854 is the GRMZM2G085854 gene, an uncharacterized glucosyltransferase that has 27% amino acid identity with BX8. 8x7 is 
at about 15 Mb towards the centromere (right), separated from the cluster by about 360 other genes. 8x6 is not found in the current release of 
the maize genome. Genetic evidence places it within 7 cM of 8x3 and Bx4 [1 1,12]. 



had a common intron with ZmBx7 in phase 0. ZmBx7 
was located on the short arm of maize chromosome 4, 
at about 15 Mb from Bx2, with over 360 annotated 
genes separating it from the Bx cluster. Its genetic prox- 
imity was previously reported [11,12]. The ZmBx7 para- 
log of B. distachyon was on chromosome 1 and the two 
paralogs of S. italica were present on scaffolds 3 and 7. 
No single copy orthologs were common to maize and these 
two species in the vicinity of the O- methyltransferases 
genes. 

The two glucosyltransferases ZmBX8 and ZmBX9 are 
also members of a multigene family and the twenty clos- 
est homologs from Avena strigosa, O. sativa, S. cereale, 
T. aestivum, S. bicolor and S. italica were analyzed. As 
ZmBx8 was located between Bx2 and BxS where no syn- 
teny conservation was observed (see above), we focused 
on synteny blocks containing ZmBx9. However, the cor- 
responding blocks in other genomes did not contain any 



genes homologous to ZmBx9. The reconstructed phylo- 
genetic tree underlined the strong relationship between 
ZmBX8 and ZmBX9 (74.4% identity) and the presence 
of one close paralog in S. cereale (BAJ07107) and four in 
T. aestivum (BAJ07089, BAJ07091, BAJ07092 and 
BAJ07090) (Figure 9). These four proteins have been 
recently described as the glucosyltransferases involved 
in the DIMBOA-biosynthetic pathway in wheat [63]. 
The closest relatives to these glucosyltransferases were 
from S. italica (Si013725, Si015361 and Si013705), S. 
bicolor (Sb09g028320), O. sativa (Osllg0441500 and 
Osllg0444000) and A. strigosa (UGT710E5) (Figure 9). 
However none of these homologs were included in 
blocks of conserved synteny with the ZmBx9 region. 

Discussion 

Our results provide the first phylogenomic analysis of a 
biosynthetic gene cluster in plants. We have mined the 
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Figure 6 Phylogenetic tree and intron map of BX1, IGL, TSA and TSAJike sequences in Poaceae. Maximum likelihood tree with branch 
lengths drawn in scale in terms of the number of substitutions per site. Nodes with dots indicate duplication events. The locations of phase 0 
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Arabidopsis lyrata are closely related but are non-Poaceae sequences. Sequences were named according to their closest homolog in maize and 
IGL numbers were arbitrarily attributed. 



complete genomes of Poaceae that are currently available, 
and all available sequences for genes related to the known 
Bx genes of maize. Our study included both species that 
produce benzoxazinoids and species that do not (Figure 7). 
Our results are relevant to the origin of benzoxazinoids, 
the evolution of the four key P450 genes of this cluster, and 
the chromosomal arrangement of this cluster. 

Origin of benzoxazinoids 

Our analysis supports the hypothesis of Frey et al. [20] 
that the pathway of benzoxazinoid biosynthesis in Poa- 
ceae is monophyletic, at least until DIBOA-glucoside. In- 
deed the phylogeny of the enzymes involved in its 
synthesis (BX1 to BX5, and the glucosyl transferase BX8 
and BX9) is congruent in benzoxazinoid producers for 
which the genes encoding these enzymes are known. It 
is likely that a basal ancestor of Poaceae in the Late 
Cretaceous, i.e. nearly 70 MYA [64], produced DIBOA 
glucoside. The origin of the last two steps, hydroxylation 
(BX6) and methylation (BX7) is less clear, because 
sequences orthologous to the maize genes are not cur- 
rently available. Benzoaxinoids have also been detected 
in single isolated species of two distant related orders of 
dicots, the Ranunculales and the Lamiales [6]. The 
dicots diverged from the monocots about 150 to 300 



million years ago [65]. As in Poaceae, the first step 
involves an indole- glycerol phosphate lyase (IGL), separ- 
ating the DI(M)BOA biosynthetic pathway from primary 
metabolism. However, duplications leading to Igl were in- 
dependent events in monocots and dicots [66]. The sec- 
ond step of the pathway leading to the production of 
indolin-2-one from indole is a P450-dependent activity in 
the DIBOA-producing dicots [66] as in Poaceae, but as no 
CYP71C genes are found in dicots, the genes responsible 
cannot be orthologs. Independent evolution of benzoxazi- 
noid biosynthesis in monocots and dicots is therefore 
most likely. This was also demonstrated recently for the 
biosynthesis of cyanogenic glucosides between monocots 
and dicots [28] , and between plants and insects [67] . 

Duplications and neofunctionalization of the CYP genes 

The phylogenetic relationship of each of the four BX- 
type P450s is robust and the Bx genes of the CYP71C 
subfamily genes in maize, wheat and wild barley are thus 
out-paralogs, their duplications occurring before speci- 
ation events. Interestingly, the intron-exon pattern of 
the CYP71C genes is not phylogenetically informative 
because of the many independent intron losses. The 
common ancestral origin of DIBOA glucoside biosyn- 
thesis in Panicoideae and Pooideae, and the availability 
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of genome sequences from the two lineages allowed us 
to examine in greater detail the origin and evolution of 
the four P450 genes, the Bx2-BxS genes. 

Although the similarity of the sequences and their 
clustering in maize would suggest that they are the prod- 
uct of a simple series of successive tandem duplications 
[27], our analysis shows a more complex evolutionary 
history. The first P450 gene, Bx2, is a member of a clade 
that contains many CYP71C sequences from a variety of 
Poaceae, including non-benzoxazinoid producers. 
Within this branch, all sequences from wheat and barley 
form a strongly supported monophyletic clade and all 
these sequences are biochemically characterized as en- 
coding BX2 enzymes [8,16,19]. The other sequences, 
from rice, sorghum and millet have not been function- 
ally characterized to date. These Bx2-like sequences 
might be active in other secondary metabolic pathway(s) 
in non-benzoxazinoid producers. We reconstructed with 
Codeml the sequence of the ancestor of the BX2/BX2- 
like clade, synthesized it and produced it in yeast to test 



our hypothesis of the original biochemical properties of 
BX2. However we were not able to express an enzymati- 
cally functional protein (results not shown). The func- 
tion of the two close BX2 homologs of O. sativa, 
CYP71C15 and CYP71C16, is unknown. Moreover no 
benzoxazinoids have been found in rice ([5] and unpub- 
lished data in [20]). Maize BX2 catalyzes N- 
demethylation of /7-chloro-N-methylaniline [16] in 
addition to the hydroxylation of indole, suggesting that 
the ancestral enzyme may have had some catalytic versa- 
tility as well. The original function of the BX2/BX2-like 
P450 ancestor thus remains hypothetical. 

If we assume that a common ancestor encoded a BX2- 
like protein catalyzing indole hydroxylation, then its dupli- 
cation could explain the expansion of the pathway from 
indole to DIBOA (Figure 10), by a series of non-successive 
duplications. This sequence of events is based on our 
phylogenetic analysis, and on the catalytic properties of 
the current representatives of each branch in maize, wheat 
and barley [8,16,17,19]. The first duplication led to "a 
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more recent BX2 ancestor" that maintained a C2 hydro- 
xylase activity in all benzoxazinoid producers until the 
present time. Its duplicate was the common ancestor of 
BX3/4/5 whose neofunctionalization led to a new C3 
hydroxylase activity that has been maintained in BX3. 
The BX3/4/5 ancestor was then duplicated first to give 
BX4 neofunctionalized to catalyze oxidative ring expan- 
sion and one more time leading to the present BX5 neo- 
functionalized to an N-hydroxylase. In this scheme 
(Figure 10) Bx3 was duplicated twice, giving first Bx4 
then BxS, so that the pathway was not elongated by suc- 
cessive tandem duplications of the gene encoding the 
last step. The relative orientation and distance between 
the Bx2-S genes in maize (Figure 5B) also suggest in- 
versions and rearrangements with only Bx3 and Bx4 
present as a tandem array. The neofunctionalization of 
each new duplicate subtly modified the active site 
resulting in the specific regioselectivity of oxidation on 
the indole-like substrate (Figure 10). 

Neofunctionalization as understood here is restricted 
to the substrate specificity of each enzyme with the con- 
servation of their overall P450 characteristics. Both 
maize and wheat BX3 enzymes hydroxylate indoline-2- 
one as well as l,4-benzoxazin-3-one [15,16], suggesting 
that the enzyme does not discriminate between its nat- 
ural substrate and its ring- expanded analog. The use of 
a ring-expanded substrate (HBOA) is a feature of BX5 
substrate specificity that supports the origin of Bx5 
from Bx3. 



Sites under selection in the P450 enzymes 

The sequence identities at the amino acid level between 
BX2 to BX5 orthologs (e.g. from 76 to 79% identity be- 
tween maize and wheat sequences, 76 to 81% between 
maize and wild barley sequences) and the BX enzyme 
substrate specificities are very high [8,15-17]. This sug- 
gests that the neofunctionalization following duplication 
in the ancestral, basal poaceous species was accompan- 
ied by selection on only a few sites. This is indeed what 
our results show. The BX3, BX5 and BX4 clades in our 
phylogeny are under strong adaptive evolution and some 
sites are under positive selection. We have identified the 
specific sites in the maize BX2-5 proteins that have 
undergone positive selection and functional divergence 
(type I and/ or type II). More than a half of these sites 
are localized between the SRS1 and the I-helix and, in 
particular, among SRS1, 2 and 4 and between the E and 
G helices. These regions seem to have a major impact 
on the catalytic properties and the evolution / divergence 
of the BX2-5 enzymes. Such a non homogenous reparti- 
tion of sites along the protein was previously described in 
a phylogenetic analysis of the CYP3 genes family [68]. 
That study proposed that SRS1, 5 and 6 were performing 
a universal CYP3A function and that SRS2, 3 and 4 were 
responsible in part for the functional differences among 
the enzymes of this family. In contrast, functional diver- 
gence in the vertebrate CYP2 family is not clustered in 
SRSs regions but distributed all along the CYP2 align- 
ment [69]. 
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We identified ten residues potentially important for 
the substrate specificity of each of the four maize P450s 
(Figures 3 and 4B). Furthermore, the docking of indole 
in our model of the active site of maize BX2 identified 
four residues in close contact with the substrate. One 
was inside SRS6 and the three other sites were localized 
in SRS2 and SRS4. The four sites we identified in maize 
BX2, among which one (He 527) differs in the BX3-BX5 
proteins, could explain the specificities of the enzyme. 
Our study thus points to Ser 156, Cys 157, Met 234 and 
He 527 as first candidates for mutagenesis approaches to 
test their impact on the biochemical properties of maize 
BX2. 

The limited number of sites shown to be under selec- 
tion or standing out as important in our modeling is 
consistent with experimental evidence obtained with 
other P450 enzymes. Few amino acid changes are 
needed to significantly change substrate specificity of 
these enzymes. For example, four residues at positions 
117, 209, 365 and 481 of Mus musculus CYP2A5 are suf- 
ficient to determine the steroid substrate specificity [70]. 
The specificity of this P450 is influenced by the hydro- 
phobicity and residues size [71,72]. In the human 
CYP2C19, three residues at positions 99, 220, and 221 
are key residues that determine the hydroxylase activity 
for omeprazole [73]. In CYP2B11 of Canis lupus three 
sites in putative SRSs (residues 114, 290 and 363) are 
important for the enzyme substrate specificity and regio/ 
stereoselectivity [74]. In Papilio polyxenes,residues 116, 
117, 371 and 484 of CYP6B1 are critical for substrate 
binding affinity [75]. While the CYP2 and CYP6 
enzymes are predominantly xenobiotic- metabolizing 
enzymes with diffuse substrate specificity, the CYP71C 
studied here are thought to have a tight biosynthetic 
function. For plant biosynthetic P450s, mutagenesis of 
the mint limonene hydroxylases from the CYP71D sub- 
family showed that a single amino acid change is suffi- 
cient to convert a C6- to a C3-hydroxylase [76]. The 
limited number of critial residues identified in our study 
is therefore reasonable to explain the subtle shifts in 
substrate regioselectivity that accompanied the evolution 
of the BX2-BX5 enzymes but this will require experi- 
mental confirmation. 

Origin of the Bx biosynthetic cluster: founding event 

The assembly of a biosynthetic gene cluster in plants 
was discussed by Frey et al. [20] with the benzoxazinoid 
pathway as a prototype. They saw three essential and se- 
quential modules: a branchpoint reaction, chemical 
modification leading to a biological active compound, 
and detoxification. Osbourn [26,77] assigned the branch- 
point reaction to a signature enzyme, and chemical 
modification to tailoring enzymes, but did not recognize 
the importance and integrality of detoxification. Instead, 



chromosomal clustering was seen as a way to prevent 
the accumulation of toxic intermediates in a pathway 
[77]. Although the description of pathway assembly by 
the juxtaposition of three modules is a useful guide, our 
analysis suggests that in the case of the benzoxazinoid 
pathway clustering of the first two genes, Bxl and Bx2, 
is the key event. Furthermore, we propose that both BX1 
and BX2 are signature enzymes that only together con- 
stitute a branchpoint committing to benzoxazinoid bio- 
synthesis. We propose to call their clustering the 
"founding event" of the biosynthetic cluster. The evi- 
dence for this view can be developed as follows: 

Indole as a product of a branchpoint reaction is not a 
committed precursor of benzoxazinoids. Phylogenetic 
analysis shows that an initial duplication of TSA led to 
an IGL ancestor that was further duplicated to Bxl. TSA 
is a subunit of tryptophan synthase in "primary metabol- 
ism", and current IGL enzymes are involved in the gen- 
eration of biologically active volatile indole [62,78]. Thus 
IGL and BX1 perform the same reaction, albeit with 
diverged catalytic properties [8,78,79]. Our study of 
TSA, TSA_like, Igl and Bxl demonstrated that Bxl origi- 
nated before the radiation of Poaceae. Although Grun 
et al. proposed that independent TSA gene duplication 
events have created Bx 1 -function in maize and wheat on 
one hand and in barley on the other [17], our phylogen- 
etic analysis clearly shows that this is not the case. The 
sequence of H. lechleri named as "BX1" by Grun et al. 
[17] clearly falls within the IGL clade, with strong boot- 
strap support. Furthermore, its catalytic properties are 
not characteristic of BX1 but rather of an IGL [17,79]. 
Its kcat/KM (31 mM - 1 . s - *) is much closer to that 
of Z. mays IGL (23 mM - 1 . s - J ) than to Z. mays BX1 
(215 mM - 1 . s - [17]. The sequence is therefore an 
ortholog of H. vulgare BAJ91226, and the H. lechleri Bxl 
remains to be discovered. The absence of synteny be- 
tween ZmBxl and other Poaceae genome regions is 
quite surprising as we found synteny conservation for 
TSA, TSA_like and Igl. The synteny of ZmBxl and Bx2 
[8] is the only conserved feature in all benzoaxazinoid 
producers and points to the uniqueness of this cluster- 
ing. The phylogenetic position of BX2 is similar to that 
of BX1, a sequence emerging from a background of sev- 
eral duplication events and remarkable only because it 
forms a monophyletic clade with the wheat and wild 
barley enzymes of identical function. Both BX1 and BX2 
have close homologs that are not involved in benzoxazi- 
noid biosynthesis, so that they are signature enzymes 
catalyzing branchpoint reactions only when considered 
together. Initial clustering of both genes enabled their 
subsequent coevolution and divergence from Igl_like and 
Bx2-like genes, respectively. In this view, genomic rear- 
rangements that led to the random clustering of the 
newly duplicated ancestral Bxl and Bx2 genes represents 
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the "founding event" of the biosynthetic cluster. This key 
innovation is therefore a structural one, and it makes 
sense because it distinguishes a biosynthetic cluster of 
genes from an assemblage of genes (not necessarily clus- 
tered) that form a biosynthetic pathway. The terms signa- 
ture/branchpoint and decoration/chemical modification 
can equally be applied to biosynthetic clusters as to geno- 
mically dispersed biosynthetic pathways, so a more spe- 
cific nomenclature is required. What then would be the 
second step ? We propose to call it "elongation" in pre- 
ference to recruitment, to emphasize the genomic feature 
over its functional aspect. 

Origin of the Bx biosynthetic cluster, elongation 

Conservation of the Bxl -Bx2 synteny from maize to 
wheat and rye [80] confirms that the founding event of 
the biosynthetic cluster occurred in an ancestor of Poa- 
ceae. Elongation of the cluster to Bx5 by the P450 dupli- 
cations and gene rearrangements described above led to 
cluster of 5 genes in maize. Is this the ancestral state or 
did the Bx3-S genes integrate the cluster together, as a 
separate event ? In both rye and wheat Bxl -2 and Bx3-S 
form two distinct clusters. In rye, ScBxl and ScBx2 are 
located on chromosome 7R and ScBx3, ScBx4 and ScBxS 
are on chromosome 5R. In wheat, TaBxl and TaBx2 are 
closely located on group-4 chromosomes and TaBx3, 
TaBx4 and TaBx5 are closely located on group-5 chro- 
mosomes [80]. Rye 5R and 7R chromosomes have high 
conserved synteny with group-5 and group-4 chromo- 
somes of wheat [80-82]. Nomura et al. [80] proposed 
that the ancestral Bx cluster was split in a common an- 
cestor of rye and wheat. Moreover, the presence of 
microlinearity and partial orthology has been demon- 
strated between wheat group-7 chromosomes (contain- 
ing the glucosyltransferase of the DIMBOA-biosynthetic 
pathway) and maize chromosomes 1 and 4 (including re- 
spectively ZmBx8 and ZmBx9). In rye, the glucosyltrans- 
ferase is also found isolated on the 4R chromosome, 
consistent with the known synteny between rye and 
wheat [83]. The addition to the cluster of a glucosyl- 
transferase gene necessary to convert DIBOA (the prod- 
uct of BX5) to DIBOA-glucoside resulted from an 
ancient gene rearrangement, and our phylogenetic ana- 
lysis indicates the orthology of the rye, wheat and maize 
genes. We conclude that this cluster elongation was also 
an early event in an ancestral Poaceae species. It 
becomes difficult to distinguish detoxification as proposed 
by Frey et al [20] and clustering (here of a glucosyltrans- 
ferase) to prevent toxic intermediate accumulation as 
proposed by Osbourn [77]. Glucosyltransferases are inte- 
gral parts of the cyanogenic glucoside biosynthetic clus- 
ters [27]. It has been proposed that physical disruption of 
the components of the cyanogenic glucoside metabolon 
can be a way to diversify the products of the pathway, as 



different intermediates are toxic to different targets [84]. 
There are therefore different ways to maintain integrity 
of a biosynthetic cluster: the genomic integrity that favors 
cosegregation of all components, and, at least for cyano- 
genic glucosides, the subcellular integrity as a metabolon. 
Is the glucosyltransferase the "last" enzyme in DIBOA- 
glucoside biosynthesis acting on the product of BX5? 
This is traditionally shown (Figure 1), and does not ad- 
dress the question of the earlier intermediates. Yet the 
products of BX3 and BX4 are observed as glucosylated 
metabolites in maize [85], Dutartre et al, in preparation], 
so that the contribution of a glucosyltransferase to the 
biosynthetic cluster may have preceded the last two 
duplications of Bx3. The maize glucosyltransferases have 
significant activity towards HBOA, the product of BX4 
[10]. Significantly, Bx8 is closest to Bxl and 2 in the clus- 
ter (Figure 5B) and is the ortholog of the rye and wheat 
genes. The wheat sequences result from hexaploidization, 
with one duplication in the B genome [83]. Bx9 is a re- 
cent duplicate of Bx8 in the maize lineage as shown by 
our phylogenetic analysis and that of Sue et al. [83]. It is 
not located in the cluster, and its catalytic properties [10] 
indicate that it has lost considerable activity toward 
DIBOA. Following the Bx8/Bx9 duplication, the sequence 
divergence of Bx9 and its new location on another 
chromosome probably led to a new physiological role dif- 
ferent from benzoxazinoid biosynthesis. The lack of QTL 
involved in DIMBOA synthesis associated with the Bx9 
region [86] supports this conclusion. 

Further elongation of the cluster corresponds to the 
aromatic hydroxylation and methylation of DIBOA glu- 
coside by BX6 and BX7 [12]. The evolutionary history of 
this elongation, and indeed of the further methylation to 
HDMBOA-glucoside is difficult to ascertain at present, 
because Bx6 and Bx7 have not been sequenced in other 
benzoxazinoid producers, and the last methyltransferase 
gene is still unknown. Bx6 has a close paralog on 
chromosome 2, and Bx7, while close to the Bx cluster, is 
about 35 cM distant. Intriguingly, the closest homologs 
of Bx6 and Bx7 in S. italica are located on scaffold 7, in 
close proximity to four P450 genes, CYP71C81, and of 
the cluster of CYP71C88, 89, 92. The latter is ortholo- 
gous to the maize CYP71C36, 56, 57 cluster on chromo- 
some 2. While the function of these genes is currently 
unknown, it is tempting to speculate that Bx6 and Bx7 
are moonlighting in a different biosynthetic cluster. 
Their position on the outside of the Bx cluster may have 
prevented them from being lost when the S. italica an- 
cestor lost the Bxl-Bx5 genes. We note that the Km of 
BX6 and BX7 towards their benzoxazinoid substrates is 
the poorest of all BX enzymes [12], and as they take glu- 
cosylated substrates and not their aglycone, the aglycone 
contribution to substrate specificity may be weak, sup- 
porting the alternative function hypothesis. 
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There are several examples of plant genes clusters 
located close to the tip of the chromosome as is com- 
monly found in actinomycetes and ascomycetes [26]. 
This position is particularly favorable to adaptive evolu- 
tion and to coordinated regulation [26,87]. The presence 
of genes in a cluster would favor their co-segregation 
and thus favor the rapid evolution of the linked genes. 
As no Bx cluster is found in S. bicolor, O. sativa and H. 
vulgare, it is likely that the original, complete cluster was 
lost in a single evolutionary event [80]. The chromo- 
somal position of the Bx cluster may have favored 
through cosegregation the subsequent rearrangements of 
the Bx8, then Bx6 and Bx7 genes in close proximity. 
Additional genome sequences from benzoxazinoid pro- 
ducers may provide additional evidence for this se- 
quence of events. Our analysis suggests that the key 
factor in the origin of biosynthetic gene clusters in 
plants, and perhaps in other organisms, is a "founder 
event" where the first two genes originating from ran- 
dom duplications and rearrangements find themselves 
linked and commit to a new pathway. Whether co- 
regulation or co-segregation is a most important result 
of this clustering remains to be ascertained. The genetic 
arguments for the primacy of co-segregation have long 
been known. In higher eukaryotes, the evidence and 
mechanisms of co-regulation of recently rearranged 
genes are less well established. 

Conclusions 

Our phylogenomic analysis of the origin of the Bx 
cluster in maize shows that the first two closely linked 
genes of the benzoxazinoid pathway are located at a 
chromosomal region that has no synteny conservation 
with the genomes of other Poaceae beyond Bxl/Bx2 
themselves, and is therefore unique. Rearrangements 
following duplications of an IGL/TSA gene and of a 
CYP71C gene resulting in the clustering of the new 
copies (Bxl and Bx2) constitute the founding event of 
the biosynthetic cluster. This founding event is a gen- 
omic character, different and perhaps more general 
than "branchpoint reaction" [20] or "signature enzyme" 
[26,77] that denote biochemical characters that would 
not adequately describe the importance of the tight 
clustering of Bxl and Bx2. Elongation of the cluster 
involved duplications of a Bx2-\ike CYP71C gene and 
neofunctionalizations that involved positive selection at 
few distinct sites of these P450 enzymes. At least one 
glucosyltransferase gene was recruited into the pathway 
and rearranged into the cluster. Our data are consist- 
ent with our current understanding of biosynthetic 
clusters in plants [20,27,28,86], but highlight the im- 
portance of the founding event in seeding a biosyn- 
thetic cluster. 
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